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The aim of this paper is to establish several deep theoretical prop- 
erties of principal component analysis for multiple-component spike 
covariance models. Our new results reveal a surprising asymptotic 
conical structure in critical sample eigendirections under the spike 
models with distinguishable (or indistinguishable) eigenvalues, when 
the sample size and/or the number of variables (or dimension) tend to 
infinity. The consistency of the sample eigenvectors relative to their 
population counterparts is determined by the ratio between the di- 
mension and the product of the sample size with the spike size. When 
this ratio converges to a nonzero constant, the sample eigenvector 
converges to a cone, with a certain angle to its corresponding pop- 
ulation eigenvector. In the High Dimension, Low Sample Size case, 
the angle between the sample eigenvector and its population counter- 
part converges to a limiting distribution. Several generalizations of 
the multi-spike covariance models are also explored, and additional 
theoretical results are presented. 

1. Introduction. Principal Component Analysis (PCA) is one of the 
most important visualization and dimension reduction tools. The theoretical 
properties of PCA, including the sample eigenvalues, eigenvectors, and PC 
scores, have been widely studied in different settings, when the sample size 
and/or the dimension increase to infinity. For example, Anderson (1963) [1] 
studied such properties under the classical statistical setting with n — > oo 
and a fixed dimension d. Johnstone and Lu (2009) [9] explored such proper- 
ties under the random matrix setting with sample size n — > oo and d ~ n. 
Jung and Marron (2009) [10] derived such properties in a High Dimension, 
Low Sample Size (HDLSS) context, with a fixed n and d —> oo. More re- 
cently, Fan et al. (2013) [7] considered scenarios where the first few leading 
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eigenvalues increase to oo together with d. See additional theoretical results 
in [2-4, 11-15, 17, 19] and references therein. 

Generally speaking, the existing results indicate that the behavior of PCA 
strongly depend on the relationship among three key quantities: the dimen- 
sion, the sample size, and the spike sizes (the relative sizes of the population 
eigenvalues {Aj}). For instance, Shen et al (2012) [17] systematically inves- 
tigated the theoretical properties of the j-th sample eigenvector and eigen- 
value as d/(nXj) — > or oo. Specifically, as d/(n\j) — > 0, the j-th. sample 
eigenvector converges to the corresponding population eigenvector, whereas 
strong inconsistency follows as d/(n\j) — > oo. 

An interesting open question is to investigate the asymptotic properties 
of PCA when d/(nXj) converges to a constant Cj E (0,oo), which is the 
aim of this paper. A broad theoretical framework of PCA under a broad 
range of cases, from the classical, through random matrix theory, and on 
to HDLSS, is studied here. Firstly, we show a new instance of unexpected 
asymptotic behavior of sample eigenvectors. Specifically, the critical sample 
eigenvectors lie in a right circular cone around the corresponding population 
eigenvectors. Although these sample eigenvectors converge to the cone, their 
locations within the cone are random. The angles of these cones have an 
increasing order, which is driven by an increasing sequence of the ratios 
Cj. We suggest this is as surprising as the HDLSS geometric representation 
results discovered by Hall et al (2005) [8], and further developed by Yata 
and Aoshima (2012) [19]. 

Secondly, we further extend the new results to the multi-spike cases where 
the population eigenvalues are asymptotically indistinguishable. We study 
the angle between the corresponding sample eigenvectors and the subspace 
spanned by the indistinguishable population eigenvectors. In HDLSS con- 
texts, the cone angles are always random variables, whereas such randomness 
disappears when the sample size increases. We also show that in HDLSS set- 
tings, the PC scores are not consistent even when the angles between the 
sample eigenvectors and their population counterparts converge to 0. 

Next we introduce two illustrative examples to help understand the main 
theoretical results in the paper, where the eigenvalues are respectively asymp- 
totically distinguishable (Example 1.1) and indistinguishable (Example 1.2). 
Our theorems are applicable to a much broader class of general spike models. 

Example 1.1. (Multiple- component spike models with distinguishable 
eigenvalues) Assume that X±, . . . ,X n are random sample vectors from a d- 
dimensional normal distribution N(0,T,), where the population eigenvalues 
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Fig 1. Geometric representation of PC directions in Example 1.1. The sphere represents 
the space of possible sample eigenvectors. Panel (A) shows that the first sample eigenvector 
tends to lie in the red cone, with the 8i angle. Similarly, Panels (B) and (C) show that the 
second and the third sample eigenvectors respectively tend to lie in the blue and the gray 
cones, whose angles are 82 and 83 . Note that the angle of the red cone is less than the blue 
cone, whose angle is again less than the gray cone. 



have the following properties: as n 
(1.1) 



oo, 



\ x > A 2 > A 3 » A4 = • • • = A rf = 1, 
^7 — > Cj, j = 1, 2, 3, with < c\ < C2 < C3 < 00. 



In Figure 1, the sphere represents the space of all possible sample eigen- 
directions, with the first three population eigenvectors as the coordinate axes. 
For this particular example, our general Theorem 3.1 suggests that 

• As n — > 00, the sample eigenvector u\ lies in the red cone, shown in 
Panel (A) of Fig. 1, where the angle of the cone is 0\ = arccos(^==). 
Similarly, as n — > 00, the sample eigenvectors u<i and 113 respectively 
lie in the blue and dark gray cones, shown in Panels (B) and (C) 
of Fig. 1, whereas the angles are respectively O2 = arccos(^7==) and 

#3 = arccos( J— )- Note that for c\ < C2 < C3, we have Q\ < 62 < 9$, 
as shown in Figure 1. 

In addition, our Proposition 3.1 includes the two boundary cases studied 
by Shen et al. (2012) [17] as special cases: 

• When ci = C2 = C3 = 0, it follows that 6\ = 62 = #3 = 0. This puts us 
in the domain of consistency [17]. 
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• In the opposite boundary case of c\ = C2 =03 = 00, we have that 
61 = &2 = #3 = 90 degrees. This leads to strong inconsistency [17]. 

Hence, our new results go well beyond the work of [17], and completely 
characterize the transition between consistency and strong inconsistency. 



Panel (A) Panel (B) 




20 25 30 35 40 45 50 55 82 84 86 88 90 92 



Angle Angle 



Fig 2. Example 1.1: Simulated angles between sample and population eigenvectors. Panel 
(A) shows realizations of angles between sample and population eigenvectors as colored dots 
(red is first, blue is second, gray is third). Distributions are studied using kernel density 
estimates, and compared with the theoretical values 9j for j — 1, 2, 3, shown as dashed lines. 
Panel (B) studies randomness of eig en- directions within the cones shown in Figure 1, by 
showing the distribution of pairwise angles between realizations of the sample eigenvectors. 
All 3 colors are overlaid here, and all angles are very close to 90 degrees, which is very 
consistent with the randomness of the respective sample eigenvectors within the cones. 

We investigated this theoretical convergence, using simulations, over a 
range of settings, with n = 50, 100, 200, 500, 1000, 2000, where d/n = 50, and 
c\ = 0.2, C2 = 0.4, C3 = 1. The full sequence, illustrating this convergence, is 
shown in Figure A of the supplementary material [18]. Figure 2 shows the 
intermediate case of n = 200. For one data set with this distribution, we 
compute angles between the sample and population eigenvectors. Repeating 
this procedure over 100 replications, we get 100 angles for each of the first 
three eigenvectors, which are shown as red, blue and gray points in Panel (A). 
The red, blue, gray curves are the corresponding kernel density estimates. 
Panel (A) shows that the simulated angles are very close to the corresponding 
theoretical angles 9j, j = 1,2,3, shown as dashed vertical lines. 

Panel (B) in Figure 2 studies randomness of eig en- directions within the 
cones shown in Figure 1. We calculate pairwise angles between realizations 
of the sample eigenvectors for the three cones, showing angles and kernel 
density estimates using colors as in Panel (A) of Figure 2. All angles are 
very close to 90 degrees, which is consistent with randomness in high dimen- 
sions, see [8, 10, 11, 19] and the more recent work of Cai et al. (2013) [6]. 
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In fact, the regions represented by circles in Figure 1, are actually d — 1 
dimensional hyper spheres, so the sample eigenvectors should be thought of 
as d-1 dimensional as d, n — > oo. 

Example 1.2. (Multiple- component spike models with indistinguishable 
eigenvalues) We again assume that X±, . . . ,X n are random sample vectors 
from a d-dimensional normal distribution iV(0, £). Different from Exam- 
ple 1.1, the six leading population eigenvalues o/S fall into three asymptot- 
ically separable pairs as follows: as n — > oo 

Ai = A2 > A3 = A4 > A5 = A6 3> A7 = • • • = \a = 1, 

nX^j.i ~^ C J' 3 = l ' 2 ' 3 ' With < Ci < C 2 < C 3 < OO. 



Panel (A) Panel (B) Panel (C) 
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Fig 3. Example 1.2: Geometric representation of PC directions. Panel (A) shows the 
cone to which the first group of sample eigenvectors converge in the red. This cone has 
angle 6\ with the gray subspace, generated by the first group of population eigenvectors. 
Similarly, Panel (B) (Panel (C)) shows the cone to which the second (third) group of 
sample eigenvectors converges shown as a blue (dark gray) cone, which has angle 62 (63) 
with the subspace, generated by the second (third) group of population eigenvectors. 

Our general Theorem 3.2, when applied to the current example, reveals 
the following insights: 

• Panel (A) in Figure 3 shows, as a red cone, the region where the first 
group of sample eigenvectors u\ and 112 lie in the limit as n —> 00. This 
has the angle Q\ = arccos(^j==) with the gray subspace, generated by 
the first group of population eigenvectors u% and U2- Similarly, Panel 
(B) (Panel (C)) presents, as a blue (gray) cone, the region where the 
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second (third) group of sample eigenvectors 113 and U4 (115 and uq) 
lie in the limit as n — >■ 00. This has the angle 62 = arccos(^=j) 

(63 = arccos(yj==) ) with the subspace, generated by the second (third) 
group of population eigenvectors U3 and U4 (U5 andu§). Note that for 
c\ < C2 < C3, we have 9\ < 62 < O3, as shown in Figure 3. 

Furthermore, our Proposition B.l in the supplementary document [18] 
considers boundary cases of our general framework, which includes the re- 
sults of Shen et al. (2012) [17] as special cases: 

• For ci = C2 = C3 = 0, it follows that 9\ = 62 = 63 = 0. This puts us in 
the domain of subspace consistency, as studied in Theorem 4-3 of [17]. 

• When ci = C2 = C3 = 00, we have that 61 = 82 = O3 = 90 degrees. This 
leads to strong inconsistency, as studied in Theorem 4-3 of [17]. 



The rest of the paper is organized as follows. Section 2 introduces the 
assumptions and notation relevant to the theorems in the paper. Section 3 
studies the asymptotic properties of PCA for multiple spike models with 
distinguishable (or indistinguishable) eigenvalues as n — > 00. Section 4 stud- 
ies the asymptotic properties of PCA in the HDLSS contexts. Section 5 
contains the technical proofs of the main theorems. Additional simulation 
studies and proofs can be found in the supplementary document [18]. 

2. Assumptions and Notation. Let X±, . . . ,X n be random vectors 
from a (i-dimensional normal distribution A^(£, E), where £ is a d x 1 mean 
vector and E is a d x d covariance matrix. Let {(Afc, Uk) ■ k = 1, • • • , d} be 
the eigenvalue-eigenvector pairs of E such that Ai > A2 > • • • > > 0. 
Thus, E has the following eigen-decomposition 

S = UAU T , 

where A = diag(Ai, . . . , \d) and U = [ui, . . . , U4}. Since the relative sizes, 
rather than the absolute values, of the population eigenvalues affect the 
asymptotic properties of PCA, we assume that Xd = 1 throughout the rest 
of the paper. 

Let X be the sample mean. As discussed in [16], 

n n— 1 

(2.1) ^2(Xi - X)(Xi - X) T has the same distribution as ^ YjY^ , 

i=l i=l 

where Yi are i.i.d random vectors from iV(0,E). It follows from (2.1) that 
the sample covariance matrix is location invariant. Thus, we can assume 
without loss of generality (WLOG): 
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Assumption 2.1. Xi, . . . ,X n are i.i.d random vectors from a d-dimensional 
normal distribution N(Q, £). 

Denote the jth normalized population PC score vector as 

(2.2) Sj = (Sij, • • • , S nd f = XjHujXu • • • , ujX n ) T , j = l,---,d, 
and define Z as the n x d random matrix as 

(2.3) Z = (z i)j ) nXd = X T UA- 1 2, 

where X = [X±, . . . , X n ] and Zij, i = 1, • • • , n, j = 1, • • ■ , <i are i.i.d random 
variables from iV(0, 1). 

Let {(Afc)"Ufc) : fc = 1) • • • > d} be the eigenvalue-eigenvector pairs of the 
sample covariance matrix S = n~ l XX T such that Ai > A2 > ... > A^. 
Thus, S can be decomposed as 

(2.4) t = UAU T , 

where A = diag(Ai, . . . , A^) and U = [iii, . . . , Ud]- Note that the data matrix 



1 „ . . . 1 



1 



n ^X has the singular value decomposition such that n 2 X = Ylj=i AJ u jvJ , 
where ij = (vij, • • • , v n j) T for j = 1, • • ■ ,d. Thus, the jth normalized sam- 
ple PC score vector is given by 

(2.5) Sj = (Sij,--- ,S n j) T = (vij,--- ,v n ,j) T , j = l,-~,d. 

We introduce an asymptotic notation. Assume that : k = 1, . . . , 00} 
(A; = n or d) is a sequence of random variables and {et ■ k = 1, . . . , 00} is 



a sequence of constants. Denote = O a . s (e&) if limjt_ ! . c 
surely with P(0 < ( < 00) = 1 . 



< C almost 



3. Growing sample size asymptotics. We now study asymptotic 
properties of PCA as n — > 00. We consider multiple component spike models 
with distinguishable population eigenvalues in Section 3.1 and with indistin- 
guishable eigenvalues in Section 3.2. Moreover, we vary d from the classical 
d fixed asymptotics, through the random matrix version with d ~ n, all the 
way to the high dimension medium sample size (HDMSS) asymptotics of Ca- 
banski et al (2010) [5] and Yata and Aoshima (2012) [20] with d > n -> 00. 
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3.1. Multiple component spike models with distinguishable eigenvalues. 
We consider multiple component spike models with m dominating spikes 
where finite m 6 [l,n A d]. The population eigenvalues are assumed to sat- 
isfy the following two assumptions: 

Al. As n — > oo, Ai > • • • > A m » A m+ i —»■•••—»■ A<j = 1. 

A2. As n — > oo, — > cj, where < ci < • • • < c m < oo. 

We first make several comments about Assumptions Al and A2. 

• Assumption Al includes two separate parts: 

(a) The Ai > • • • > A m part makes it possible to separately consider 
the first m principle component signals and study the correspond- 
ing asymptotic properties. 

(b) The A m 3> A m+ i —>••••—>• A<2 = 1 enables clear separation of 
the signal (contained in the first m components) from the noise 
(in the higher order components), which then helps to derive the 
asymptotic properties of the first m sample eigenvalues, eigenvec- 
tors, and PC scores. 

• Assumption A2 is the critical case, in which the positive information 
and the negative are of the same order. In particular, increasing n and 
the spike positively impacts the consistency of PCA, whereas increas- 
ing d has a negative impact. 




S=span{u i , jgH} 



Fig 4. Angle between the sample eigenvector iij and the space S. The blue vector is the 
projection of the red vector Uj onto the space §. 

While the main focus of our results is the signal eigenvectors, some no- 
tation for the noise eigenvectors is also useful. According to Assumption 
.Al, the noise sample eigenvalues whose indices are greater than m can not 
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be asymptotically distinguished, so the corresponding eigenvectors should 
be treated as a whole. Therefore, we define the noise index set H = {m + 
1 , • • • , d}, and denote the space spanned by these noise eigenvectors as 

(3.1) § = span{tij,j G H}. 

For each sample eigenvector uj, j S H, we study the angle between Uj and 
the space S, as defined in [10, 17] and illustrated in Figure 4, i.e. the angle 
between Uj (the red vector) and its projection onto S (the blue vector). 

The following theorem derives the asymptotic properties of the first m 
sample eigenvalues and eigenvectors. In addition, the theorem also shows 
that, for j = m + 1, ■ ■ ■ , [n A d], the angle between Uj and Uj goes to 90 
degrees, whereas the angle between Uj and the space § goes to 0. 

Theorem 3.1. Under Assumptions 2.1, AX, and Al, as n — > oo, the 
sample eigenvalues satisfy 



(3.2) 



Igi^l, m + l<j< [nAd], 



and the sample eigenvectors satisfy 
(3.3) 



|< uj, uj >|— > (1 + Cj) 2 7 1 < j < m, 
\<Uj,Uj >|=O a>s |(5)s|, m+1 <j <[nAd], 
angle < u,j,E> >-^> 0, m + 1 < j < [n A d]. 



We now offer several remarks regarding Theorem 3.1. 

Remark 3.1. The results of (3.2) and (3.3) suggest that, as the eigen- 
value index increases, the proportional bias between the sample and popula- 
tion eigenvalue increases, so the angle between the sample and correspond- 
ing population eigenvectors increases. This is because larger eigenvalues (i.e. 
with small indices ) contain more positive information, which makes the cor- 
responding sample eigenvalues/eigenvectors less biased. These results are 
graphically illustrated in Figure 1 and empirically verified in Figure 2, for 
the specific model in Example 1.1. More empirical support is provided in the 
supplementary material [18]. 

Remark 3.2. Theorem 3.1 can be extended to include the classical and 
random matrix cases, by allowing c mo = for some m,Q < m, which sug- 
gests that positive information dominates in the leading mo spikes. Then 
Assumptions Al and Al respectively become 
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.43. as n — >• oo, the population eigenvalues satisfy 

Ai > ■ • ■ > A mo S> A mo+ i > ■ ■ • > A m S> A m+ i — > • • • — > Xd = 1. 
.44. as n — t- oo, d/ (nAj) — >■ Cj /or j = 1, • ■ ■ , m, where = c\ = ■ ■ ■ = 

For the classical case with fixed dimension d, mo = m = d in Assumptions 
A3 and A4. For random matrix cases with n ~ d, mo = m in Assumptions 
A3 and AA. Since c\ = ■ ■ ■ = c mo = in Assumption A3, if the eigenvalue 
index is less than or equal to mo, the corresponding sample eigenvalues and 
eigenvectors are consistent. These results are summarized in the following 
Proposition 3.1(a). 

Remark 3.3. Another extension of Theorem 3.1 is to allow c mo +i = oo 
for some mo < m, i.e. negative information dominates in higher-order 
spikes. This contains the HDMSS cases [5, 20], where d 3> n — > oo. As- 
sumption Al then becomes Assumption A3, and Assumption A2 becomes 

A5. as n — > oo, d/(n\j) —> cj for j = 1, • • ■ , m, where < ci < • • • < 

Cmo ^ — ' — C m — OO. 

Since c mo+ i = • • • = c m = oo, for index j > mo + 1, the proportional error 
between the sample and population eigenvalues goes to infinity, and the angle 
between the corresponding sample and population eigenvectors converges to 
90 degrees. These results are summarized in Proposition 3.1(b). 

Proposition 3.1. (a) Under Assumptions 2.1, A3 and AA, the sam- 
ple eigenvalues and eigenvectors satisfy 

Xj/Xj and \<Uj,Uj>\ —> 1, 1 < j < mo, 

and the properties of the other sample eigenvalues and eigenvectors 
remain the same as in Theorem 3.1. 
(b) Let H = {mo + 1, • • • ,d} and define § as in (3.1). // Assumption A4 
in (a) is replaced by Assumption A5, the sample eigenvalues satisfy 

nXj/d 1, mo + 1 < j < m, 

and the sample eigenvectors satisfy 

[ \<Uj,Uj >|=O a . s {(^)B, ^-^r a ii 

[ angle < Uj,E> >— > 0, 

the properties of the other sample eigenvalues and eigenvectors remain 
the same as in Theorem 3.1. 
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(c) In addition, if Assumption AA in (a) is strengthened to d/X mo 
then the sample PC scores satisfy 



0, 



s, 



s„ 



1,3 



^■■■n, j = !,■■■ ,m . 



3.2. Multiple component spike models with indistinguishable eigenvalues. 
We now consider spike models with the m leading eigenvalues being grouped 
into r(> 1) tiers, each of which contains eigenvalues that are either the same 
or have the same limit. The eigenvalues within different tiers have different 
limits. Specifically, the first m eigenvalues are grouped into r tiers, in which 
there are qk eigenvalues in the kth tier such that Yli=i Qi = m - Define qo = 0, 
q r+ i = d — Ya=i 1h an d the index set of the eigenvalues in the kth tier as 

(fc-1 fc-1 fc-1 "I 

(3.4) H k = \Y j q l + l,Y j qi + %--- ,J2® + < lk\> k = l,---,r + l. 

I 1=0 1=0 1=0 J 

We make the following assumptions on the tiered eigenvalues: 
Bl. The eigenvalues in the kth tier have the same limit 5k{> 0): 

\< 

hm^oo-^- = 1, j £ H k , k = 1, • • • ,r. 

Ok 

B2. The eigenvalues in different tiers have different limits: 

as n — > oo, Si > ■ ■ ■ > S r S> X m +i —>■•••—>■ = 1. 

63. The ratio between the dimension and the product of the sample size 
with eigenvalues in the same tier converges to a constant: 

d 

as n — > oo, — > Ck, with < c\ < ■ ■ ■ < c r < oo. 

Assumptions B2 and B3 are natural extensions of Assumptions Al and 
A2. In Assumption B2, the signal contained in the first r tiers of eigenvalues 
is well separated from the noise, and hence the asymptotic properties of 
the sample eigenvalues and eigenvectors in the first r tiers can be obtained. 
Assumption B3 suggests that the positive information (sample size and spike 
size) and the negative information (dimension) are of the same order. 

Since the sample eigenvalues within the same tier can not be asymptoti- 
cally identified, the corresponding sample eigenvectors are indistinguishable. 
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For j E Hj~, in order to study the asymptotic properties of the sample eigen- 
vector Uj, we consider the angle between uj and the subspace spanned by 
the population eigenvectors Uj in the same tier, defined as 



Our theoretical results are summarized in the following theorem. 

Theorem 3.2. Under Assumptions 2.1, 01, 02 and 03, as n — > oo, the 
sample eigenvalues satisfy 



Theorem 3.2 is an extension of Theorem 3.1. For higher-order eigenvalues, 
the sample eigenvalues are more biased, while the angles between the sample 
eigenvectors and the subspaces spanned by their population counterparts in 
the same tiers are larger. See Figure 3 for an illustration of the specific 
model considered in Example 1.2. Theorem 3.2 can be extended to cover 
the classical, random matrix, and HDMSS cases, which is done in Section B 
of the supplementary material [18]. 

4. High dimension, low sample size asymptotics. We now study 
the asymptotic properties of PCA in the HDLSS context. In this case, the 
ratios between the sample eigenvalues and their population counterparts 
converge to non degenerate random variables, as do the angles between the 
sample eigenvectors and the space spanned by the corresponding popula- 
tion eigenvectors. This phenomenon of random limits does not exist when n 
increases to oo as shown in Section 3. 

Since the sample size is fixed, we can not distinguish the two types of spike 
models considered respectively in Sections 3.1 and 3.2. Hence, we merge the 
model assumptions there into the following corresponding assumptions: 

CI. For fixed n, as d — > oo, Ai > • • • > A m 3> X m +i —>••••—>• A<f = 1. 



(3.5) 



§ fc = span{uj, j E H k }. 



(3.6) 
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C2. For fixed n, as d — > oo, 
d 



with < c\ < ■ ■ ■ < c m < oo. 



In particular, Assumption CI is parallel to Assumptions Al, Bl and B2, 
while Assumption C2 corresponds to Assumptions A2 and £>3. 

As stated below in Theorem 4.1, the sample eigenvalues and eigenvectors 
converge to non-degenerate random variables rather than constants. We de- 
fine several quantities in order to describe the limiting random variables. 
Define the m x d matrix 

M = [C, O m x(d-m)]mxd> 

where C = diag{c 1 1//2 , • • • , c m ^ 2 } isanmxm diagonal matrix and Q m x{d-m) 
is the m x (d — m) zero matrix. In addition, define the random matrix W as 

(4.1) W = MZ T ZM T , 

where Z is defined in (2.3). The eigenvalues of the random matrix W appear 
in the random limits of Theorem 4.1, as in (4.2) and (4.3). 

Given the fixed sample size, the sample eigenvalues can not be asymp- 
totically distinguished, nor can the corresponding sample eigenvectors. To 
study the asymptotic behavior of the sample eigenvectors, we need to con- 
sider the space spanned by the corresponding population eigenvectors, 
as defined in (3.5), with the two index sets being Hi = {1, • • • ,m} and 
H 2 = {m + !,-■■ ,4. 

We are now ready to state the main theorem in the HDLSS contexts. 

Theorem 4.1. Under Assumptions 2.1, CI and C2, for fixed n, as d — > 
00, the sample eigenvalues satisfy 



(4.2) 



Xj(W)+Cj, l<j<m, 



n\j a.s 
d\j 



1, m + 1 < j < n, 



where W is defined in (4.1), and the sample eigenvectors satisfy 



(4.3) 



1 



angle < Uj,E>i >— > arccos <j ( 1 + yf$v) ) f > ^ — 3 — m ' 



|< Uj, Uj >|= O a . s (d 2), m + 1 < j < n, 
k angle < uj, §2 >— > 1, m + 1 < j < n. 
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Three remarks are offered below regarding Theorem 4.1. 

Remark 4.1. Ifm = \ in Theorem 4-1, i.e. for single- component spike 
models, then the first sample eigenvalue and eigenvector satisfy 



where Xn ^ s ^ ne Chi-square distribution with n degrees of freedom. This result 
is consistent with Theorem 1 of Jung et al. (2012) [11]. 

Remark 4.2. For 1 < j < m, as the relative size of the eigenvalue de- 
creases, the angle between ilj and Si increases. However, this phenomenon 
is not as strong as in the growing sample size settings studied in Section 3, 
where the sample eigenvectors can be separately studied, and the correspond- 
ing angles have a non-random increasing order. 

Remark 4.3. Assumption C2 can be relaxed to include boundary cases, 
in which there exists an integer tjiq G [1,iti] such that c mo = 0, i.e. positive 
information dominates in the leading mo spikes; or c mo+ i = oo, i.e. negative 
information dominates in the remaining high-order spikes. These theoretical 
results are presented in Section C of the supplementary material [18]. 

5. Proofs. We now provide some proofs of our theorems as n — > oo. For 
the sake of space, we only present detailed proof for the properties of the 
sample eigenvectors here, which is the most challenging part. In contrast to 
showing consistency or inconsistency of the sample eigenvector, this proof 
requires precise calculation of the degree of inconsistency, i.e. the limiting 
angles between the sample and population eigenvectors. We relegate the 
derivations regarding the sample eigenvalues to Section D of the supplemen- 
tary material [18], which also contains proofs of Proposition 3.1, Theorem 
4.1, as well as extensions of Theorems 3.2 and 4.1. 

The critical ideas of the proof are to first partition the sample eigenvec- 
tor matrix U into sub-matrices, corresponding to the group index Hk- Then 
through careful analysis, we explore the connections between sample eigen- 
vectors and eigenvalues and then use the sample eigenvalue properties to 
study the asymptotic properties of the sample eigenvectors. 

WLOG, we assume that A m +i = • • • = Ad = 1. Due to the invariance 
property of the angle between the sample and population eigenvectors, see 
Shen et al. (2012) [17], we assume WLOG that the population eigenvectors 
Uj = ej, j = 1, . . . ,d, where the j-th component of Bj equals 1 and the 
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rest are zero. It follows that the inner product between the sample and 
population eigenvectors satisfies 

|< uj,Uj >| 2 =|< uj,ej >| 2 = Ujj, 

and the angle between the sample eigenvector and the corresponding popu- 
lation subspace in (3.5) satisfies 

(5.1) (cos [angle (uj,S fc )]) 2 = ^ ufj, k = l,--,r + l. 

The population eigenvalues are grouped into r + 1 tiers and in (3.4) 
is the index set of the eigenvalues in the kth tier. Define 

Uk,i = (ui,j)ieH k ,jeHi, 1 < k, I < r + 1. 

Then, the sample eigenvector matrix U can be expressed as: 



(5.2) U = [ui,U2, ■■■ ,u d ] 



( #1,1 Ul,2 ■ ■ ■ Ul,r+1 \ 
U2,l U2,2 ■ ■ ■ C^2,r+1 

\U r +l,l U r+ i2 ■ ■ ■ U r +i r +iJ 



The proof of the asymptotic properties of the sample eigenvectors (3.7) 
depends on the asymptotic properties of the sample eigenvalues, as stated in 
(3.6) of Theorem 3.2, which are derived in the supplementary material [18]. 
The following proof considers two groups of sample eigenvectors separately. 
Section 5.1 obtains the asymptotic properties for the sample eigenvectors 
whose index is greater than m. Section 5.2 derives asymptotic properties for 
the sample eigenvectors whose index is less than or equal to m. 

5.1. Asymptotic properties of the sample eigenvectors iij with j > m. We 
derive the asymptotic properties through the following two steps: 

• First, we show that as n — > oo, the angle between uj and Uj converges 
to 90 degrees: 

(5.3) |< Uj,Uj >| 2 = Ujj = O a . s , j = m + 1, • • • , [n A d\. 

• Then, we show that as n — > oo, the angle between uj and the cor- 
responding subspace § r +i converges to 0, where §y+i is defined as 
in (3.5): 

(5.4) angle < Uj, S r +i >— > 0, j = m + 1, ■ ■ ■ , [n A d\. 
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We now provide the proof for the first step. Denote W = A - 2 [/A 2 , where 
U is the sample eigenvector matrix and A is the sample eigenvalue matrix 
defined in (2.4). It follows from (2.3) and (2.4) that WW T = \ZZ J \ where Z 

is defined in (2.3). Considering the fc-th diagonal entry of the two equivalent 

_ I - 1 

matrices WW T and ^ZZ T , and noting that Wkj = X k 2 XJiikj, it follows 
that 

d d 1 n 

( 5 - 5 ) K 1 /Z X ^kJ = Z~2 w h = ~zZ z lk- k = l,---,d. 

j=l j=l i=l 

In addition, note that i Ya=i z fk 1> as n — ?■ oo, and Xj = for j > [nAd]. 
Combining the above with (5.5), we obtain that 

r [nAd] 

(5.6) EE A ^H+ E K'^h^h k = l,-..,d. 
1=1 jeHi j=m+l 

Furthermore, it follows from (5.6) that as n — > oo, 

(5.7) u|j < j = m + 1, • • • , [n A d], 

^3 

which, together with the asymptotic properties of the sample eigenvalues (3.6), 
yields (5.3). 

We then move on to prove the second step. According to (5.1), we need 
to show that 

d 

(5.8) Kj^ 1 ' 3 =m +!,-■■ ,[nAd]. 
k=m+l 

The non-zero k-th diagonal entry of W T W is between its smallest and largest 
eigenvalues. Since W T W shares the same non-zero eigenvalues as ^Z T Z, it 
follows that for j = 1 , • • • , [n A d] , 

^ d d ^ 

(5-9) A min (-Z T Z) < A^A- 1 ^- =Y,<3 ^ X max (-Z T Z), 

k=l k=l 
which yields that, for j = m + 1, • • • , [n A d], 

(5.10) ^A min ( Vz) < < ^ W(;W 

A i n k=i A i n 
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According to Lemma D.l in the supplementary material [18] and the asymp- 
totic properties of the sample eigenvalues (3.6), we have that, for j = 
m + 1, ■ ■ ■ , [n A d], 

X i \ . ( l 7T 7 \ A i\ ( 1 ?t 



(5.11) fX^^-Z'Zj and ^A m ax {-Z> Zj ^ 1. 

In addition, it follows from Assumption £>2 that, for j = m + 1, • • ■ , [n A d], 

, . f AjA fe 1 -> 0, fc = l,-- - ,m, 

1 j \ AjA fe 1 -»■ 1, fc = m + l,---d. 

Combining (5.10), (5.11), and (5.12), we have (5.8), which further leads 
to (5.4). 

5.2. Asymptotic properties of the sample eigenvectors iij with j 6 [l,m]. 
We need to prove that, for j = 1, ••• ,m, the angle between the sample 
eigenvector uj and the corresponding population subspace Si, j £ Hi, con- 
verges to arccos(^j=), I = 1, • • ■ , r. According to (5.1), we only need to 
show that 

(5.13) £u| J^_^, j£Hi,l = l,...,r. 

Below, we provide the detailed proof of (5.13) for I = 1, and briefly illustrate 
how repeating the same procedure can lead to (5.13) for I > 2. 

In order to show (5.13) for I = 1, we need the following lemma about the 
asymptotic properties of the eigenvector matrix U in (5.2): 

Lemma 5.1. Under Assumptions in Theorem 3.2 and as n — > oo, the 
rows of the eigenvector matrix U satisfy 



(5.14) ^(1 + c^chcj 1 fifcj ^ 1. keH h ,h = l 

i=i jeHi 

and the columns of the eigenvector matrix U satisfy 



5 5 ' 3 



(5.i5) EE^^rr^ 

In addition, we also have 



(5.16) fcGHl . 
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Lemma 5.1 is proven in Section D.3.3 of the supplementary material [18]. 
We now show how to use Lemma 5.1 to prove (5.13) for I = 1. Let h = 1 
in (5.14), and then we have that 

r 

(5.17) £(1 + a)^ 1 £ «L" ^ ke R i- 

i=i jeHi 

Note that c\cT < 1 for / > 1, and comparing (5.16) with (5.17), we get that 

1=2 jeH t jeHj. 
which then yields that 



(5.i9) £ £ uL 91 



J 1 + ci 



where gi is the number of eigenvalues in i/i (3.4). Summing over j G i/i 
in (5.15), we have that 



( 5 - 2 °) E E E 

h=l k&H h j&H, 
It follows from (5.19) and (5.20) that 



~2 a.s (/l 
tit 



fc > j l + Cl 

h=l k£H h jem 



(5-21) E E E ^ o. 

which, together with (5.15) for Z = 1, yields 

E»2 a.s 1 . TT 

keHi 1 

which is (5.13) for I = 1. 

We now prove (5.13) for I = 2, • • • , r. Note that 

• it follows from (5.21) that (5.14) becomes 

r 

(5.22) £(1 + Cl )c h cJ l £ ul d ^ 1, G flh, /i = 2, 

1=2 jeHi 
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• it follows from (5.18) that (5.15) becomes 

(5.23) ^J^uh^^—, jeH h l = 2,---,r. 

h=2 keH h ° l 

• similar to (5.16), we have 

r 

(5.24) ^(i + qjj^^i, keH 2 . 
1=2 jeHt 

Finally, combining (5.22), (5.23) and (5.24), we can prove (5.13) for I = 2. 
We can repeat the same procedure for I = 3, • • • , r. 

SUPPLEMENTARY MATERIAL 

Simulations and proofs 

(http:/ /www. unc.edu/~dshen/BBPCA/BBPC ASupplement.pdf). The sup- 
plementary material contains additional simulation results that empirically 
verify the theoretical convergence of the angles between sample eigenvectors 
and their popularion counterparts, reported in our theorems. We also pro- 
vide detailed proofs for our theorems and their extensions under both the 
growing sample size and HDLSS contexts. 
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